{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lab 3:  groupby and more (Seaborn) plots\n",
    "\n",
    "This lab explores the FBI NICS Firearms Background Check data, which records the number of background check made.  A background check must be made prior to *some* sales of firearms (a big exception is private sales.)  This data is often used as the best approximation of total gun sales at a given time.\n",
    "\n",
    "BuzzFeed converts the PDF data supplied by the FBI to CSV files.\n",
    "\n",
    "For more information on the dataset: [https://github.com/BuzzFeedNews/nics-firearm-background-checks](https://github.com/BuzzFeedNews/nics-firearm-background-checks)\n",
    "\n",
    "For a direct link to the dataset (current as of July 2019):  [https://raw.githubusercontent.com/BuzzFeedNews/nics-firearm-background-checks/master/data/nics-firearm-background-checks.csv](https://raw.githubusercontent.com/BuzzFeedNews/nics-firearm-background-checks/master/data/nics-firearm-background-checks.csv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "%matplotlib inline\n",
    "\n",
    "pd.set_option('display.max_columns', None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read the CSV file into a dataframe called `guns`, and display the dataframe to make sure it was loaded correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make the `month` column into a `datetime` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There was no day in the original `month` column.  What happens to the day once we convert this column into a `datetime` object?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get a feel for the data, plot the number of handgun background checks (the `handgun` column) made in New York on the y axis and the date on the x axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What do you notice about the plot?\n",
    "\n",
    "What was the mean number of handgun background checks? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Groupby\n",
    "\n",
    "\n",
    "What if we wanted to find the mean number of handgun checks for each state?  Our usual method of filtering would take a while.  Instead we will use the *group by* process, which:\n",
    "- *splits* the data into groups based on some criteria\n",
    "- *applies* a function to each group independently\n",
    "- *combines* the results into a data structure\n",
    "\n",
    "The splitting step is done by the function `groupby()` and a second function, like `mean()`, is applied to the groups."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guns.groupby(\"state\").mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we only wanted to see the `handgun` column, we can use:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guns.groupby(\"state\").mean()[\"handgun\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Other functions we can use with `groupby()` are:\n",
    "- `mean()` : Compute mean of groups\n",
    "- `sum()` : Compute sum of group values\n",
    "- `size()` : Compute group sizes\n",
    "- `count()` : Compute count of group\n",
    "- `std()` : Standard deviation of groups\n",
    "- `var()` : Compute variance of groups\n",
    "- `describe()` : Generates descriptive statistics\n",
    "- `min()` : Compute min of group values\n",
    "- `max()` : Compute max of group values\n",
    "\n",
    "For example, what is the standard deviation of long gun background checks in all states?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the output of `guns.groupby(\"state\").mean()[\"handgun\"]` looks a lot like the output of `value_counts()`.  We can use it to make a bar plot.  Try it below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guns.groupby(\"state\").mean()[\"handgun\"].plot.bar()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details> <summary>Answer:</summary>\n",
    "guns.groupby(\"state\").mean()[\"handgun\"].plot.bar()\n",
    "</details>\n",
    "\n",
    "We can also use `groupby` for dates.  For example, to sum by month:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guns.groupby(guns[\"month\"].dt.month).sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which month has the most background checks for long guns?  For handgruns?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Seaborn plotting\n",
    "\n",
    "[Seaborn](https://seaborn.pydata.org) is a Python package for creating beautiful plots.\n",
    "\n",
    "For example, suppose we want to make a scatter plot but use size and color to add more information to the plot.\n",
    "\n",
    "In Pandas, make a scatter plot with number of handgun background checks on the x axis and number of long gun background checks on the y axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make the same plot in Seaborn, we use the code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.relplot(x =\"handgun\", y = \"long_gun\", data = guns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To color the points by the state:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.relplot(x =\"handgun\", y = \"long_gun\", hue = \"state\", data = guns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This plot is a little hard to interpret, so let's make a smaller dataset with only 5 states (whichever 5 you would like)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To size the circles by the total number of permit checks made that month:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.relplot(x =\"handgun\", y = \"long_gun\", hue = \"state\", size = \"permit\", data = guns5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are some large hand gun and long gun background check values.  What state are they from?\n",
    "\n",
    "What are the maximum values in the `handgun` and `long_gun` columns?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's find a row containing the median handgun value 3280:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guns.loc[guns[\"handgun\"] == 3280]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now find the rows containing the maximum handgun and long_gun values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Challenges\n",
    "\n",
    "- make a hexagonal plot of just the Texas handgun vs. long_gun background check numbers\n",
    "- choose another Seaborn plot from the [gallery](https://seaborn.pydata.org/examples/index.html).  Can you make it with using background check data?"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}